contraction property
- North America > Canada > Alberta (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Entropic Mirror Monte Carlo
Cherradi, Anas, Janati, Yazid, Durmus, Alain, Corff, Sylvain Le, Petetin, Yohan, Stoehr, Julien
Importance sampling is a Monte Carlo method which designs estimators of expectations under a target distribution using weighted samples from a proposal distribution. When the target distribution is complex, such as multimodal distributions in highdimensional spaces, the efficiency of importance sampling critically depends on the choice of the proposal distribution. In this paper, we propose a novel adaptive scheme for the construction of efficient proposal distributions. Our algorithm promotes efficient exploration of the target distribution by combining global sampling mechanisms with a delayed weighting procedure. The proposed weighting mechanism plays a key role by enabling rapid resampling in regions where the proposal distribution is poorly adapted to the target. Our sampling algorithm is shown to be geometrically convergent under mild assumptions and is illustrated through various numerical experiments.
- North America > Canada > Alberta (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training Chia-Y u Chen
Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained. To overcome this limitation, numerous gradient compression techniques have been proposed and have demonstrated high compression ratios. However, most existing methods do not scale well to large scale distributed systems (due to gradient build-up) and/or fail to evaluate model fidelity (test accuracy) on large datasets.
- North America > United States (0.04)
- North America > Canada (0.04)
Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning
Xu, Yang, Ganesh, Swetha, Aggarwal, Vaneet
We present the first $Q$-learning and actor-critic algorithms for robust average reward Markov Decision Processes (MDPs) with non-asymptotic convergence under contamination, TV distance and Wasserstein distance uncertainty sets. We show that the robust $Q$ Bellman operator is a strict contractive mapping with respect to a carefully constructed semi-norm with constant functions being quotiented out. This property supports a stochastic approximation update, that learns the optimal robust $Q$ function in $\tilde{\cO}(ε^{-2})$ samples. We also show that the same idea can be used for robust $Q$ function estimation, which can be further used for critic estimation. Coupling it with theories in robust policy mirror descent update, we present a natural actor-critic algorithm that attains an $ε$-optimal robust policy in $\tilde{\cO}(ε^{-3})$ samples. These results advance the theory of distributionally robust reinforcement learning in the average reward setting.
- North America > United States (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
Denoising-based Contractive Imitation Learning
Shen, Macheng, Peng, Jishen, Huang, Zefang
A fundamental challenge in imitation learning is the \emph{covariate shift} problem. Existing methods to mitigate covariate shift often require additional expert interactions, access to environment dynamics, or complex adversarial training, which may not be practical in real-world applications. In this paper, we propose a simple yet effective method (DeCIL) to mitigate covariate shift by incorporating a denoising mechanism that enhances the contraction properties of the state transition mapping. Our approach involves training two neural networks: a dynamics model ( f ) that predicts the next state from the current state, and a joint state-action denoising policy network ( d ) that refines this state prediction via denoising and outputs the corresponding action. We provide theoretical analysis showing that the denoising network acts as a local contraction mapping, reducing the error propagation of the state transition and improving stability. Our method is straightforward to implement and can be easily integrated with existing imitation learning frameworks without requiring additional expert data or complex modifications to the training procedure. Empirical results demonstrate that our approach effectively improves success rate of various imitation learning tasks under noise perturbation.
Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning
Xu, Yang, Mondal, Washim Uddin, Aggarwal, Vaneet
We present the first finite-sample analysis for policy evaluation in robust average-reward Markov Decision Processes (MDPs). Prior works in this setting have established only asymptotic convergence guarantees, leaving open the question of sample complexity. In this work, we address this gap by establishing that the robust Bellman operator is a contraction under the span semi-norm, and developing a stochastic approximation framework with controlled bias. Our approach builds upon Multi-Level Monte Carlo (MLMC) techniques to estimate the robust Bellman operator efficiently. To overcome the infinite expected sample complexity inherent in standard MLMC, we introduce a truncation mechanism based on a geometric distribution, ensuring a finite constant sample complexity while maintaining a small bias that decays exponentially with the truncation level. Our method achieves the order-optimal sample complexity of $\tilde{\mathcal{O}}(\epsilon^{-2})$ for robust policy evaluation and robust average reward estimation, marking a significant advancement in robust reinforcement learning theory.
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > India > Uttar Pradesh > Kanpur (0.04)
ICODE: Modeling Dynamical Systems with Extrinsic Input Information
Li, Zhaoyi, Mei, Wenjie, Yu, Ke, Bai, Yang, Li, Shihua
Learning models of dynamical systems with external inputs, that may be, for example, nonsmooth or piecewise, is crucial for studying complex phenomena and predicting future state evolution, which is essential for applications such as safety guarantees and decision-making. In this work, we introduce \emph{Input Concomitant Neural ODEs (ICODEs)}, which incorporate precise real-time input information into the learning process of the models, rather than treating the inputs as hidden parameters to be learned. The sufficient conditions to ensure the model's contraction property are provided to guarantee that system trajectories of the trained model converge to a fixed point, regardless of initial conditions across different training processes. We validate our method through experiments on several representative real dynamics: Single-link robot, DC-to-DC converter, motion dynamics of a rigid body, Rabinovich-Fabrikant equation, Glycolytic-glycogenolytic pathway model, and heat conduction equation. The experimental results demonstrate that our proposed ICODEs efficiently learn the ground truth systems, achieving superior prediction performance under both typical and atypical inputs. This work offers a valuable class of neural ODE models for understanding physical systems with explicit external input information, with potential promising applications in fields such as physics and robotics.
- Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)